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Abstract 

Competition between independently arising beneficial mutations is enhanced in spatial populations due to 
the linear rather than exponential growth of the clones. Recent theoretical studies have pointed out that the 
resulting fitness dynamics is analogous to a surface growth process, where new layers nucleate and spread 
stochastically, leading to the build up of scale-invariant roughness. This scenario differs qualitatively from the 
standard view of adaptation in that the speed of adaptation becomes independent of population size while the 
fitness variance does not, in apparent violation of Fisher's fundamental theorem. Here we exploit recent progress 
in the understanding of surface growth processes to obtain precise predictions for the universal, non-Gaussian 
shape of the fitness distribution for one-dimensional habitats, which are verified by simulations. 

1 Introduction 

The appearance of a beneficial mutation in a population and its fixation is the most basic process 
of adaptation. This process determines the rate of evolution, or how quickly populations adapt to 
new environments. If beneficial mutations are very rare, then there is little genetic diversity and the 
adaptation rate is mutation limited. That is, once a new beneficial mutation appeared, it would sweep 
the whole population quickly, and the next mutation would be sufficiently separated in time as to not 
interfere. This regime is generally referred to as periodic selection [l][2]- 

However, recent microbial experiments suggest beneficial mutations are more common than previ- 
ously thought [SI m |5] . A higher rate of beneficial mutations creates a genetically diverse population. 
Coexisting beneficial mutations in different lineages must compete with each other, if there is little or 
no recombination. In this regime of mutation competition, few beneficial mutations survive, reducing 

* Emory University, Physics Department Atlanta, Georgia, United States 

t University of Pennsylvania, Biology Department, Philadelphia, Pennsylvania, United States 
tUniversity of Cologne, Institute for Theoretical Physics, Koln, Germany 



the rate of evolution, as seen in microbial evolution experiments [51[7]. Fisher's fundamental theorem 
equates the rate of evolution with the variance of the fitness distribution [S] , which can be approximated 
analytically in simplified population genetic models. These recent theoretical analyses have found the 
rate of evolution in large populations of asexuals is not proportional to the total supply rate of benefi- 
cial mutations, but depends much more weakly (logarithmically) on population size and mutation rate 

These analyses were limited to well-mixed populations, where each individual competes with the 
whole population, such as microbes in liquid culture. However, many populations are not well-mixed, but 
are confined in space such that they only compete with a limited neighborhood population, on timescales 
of a generation. Spatial structure, while often neglected as an inconvenient detail, is ubiquitous, from 
plants and animals over large areas of land, to microbes in biofilms [15] to cancer [16l [17]. When 
mutations are rare, a single beneficial mutation can effectively compete with the whole population, and 
the fixation probability is the same in well- mixed and spatially structured populations |18| I19j . By 
contrast, in the spatial setting with clonal interference, the relationships between population size, the 
speed of evolution, and variance of the fitness distribution are fundamentally different. 

Fisher, Kolmogorov and coworkers [20l [21] first described the spread of a beneficial mutation in a 
spatially continuous population as a genetic wave. This wave spreads with constant speed which is 
much slower than the exponential growth of a beneficial mutation in a well-mixed population. The slow 
spreading increases the chances that mutations must compete with each other, and reduces the rate 
of evolution. Recent simulations of clonal interference of populations with one- and two-dimensional 
spatial structures [HI [531 [Ml [2S| found the rate of evolution to be even slower than in well-mixed 
populations. The rate of evolution does not depend only on the supply of beneficial mutations, but 
becomes independent of system size, and depends on mutation rate as a power law with exponent less 
than one. Intuitively, each location competes with an effective local population, that does not depend 
on the total system size, but depends on the beneficial mutation rate, migration rate and selection 
coefficients. While the speed becomes independent of population size, the variance (in the steady state) 
scales as a power of population size, violating Fisher's theorem. This also implies that there is a 
long transient regime during which the stationary variance builds up, while the speed of adaptation is 
constant. 

Here, we study the transient regime of evolving spatial populations. Starting from uniform (monomor- 
phic) conditions, the fitness variance grows as a power law in time, and then saturates at a value deter- 
mined by the system size (also as a power law) [211 [IS] . This behavior is analogous to surface growth 
models in physics, where particles are deposited on an initially flat surface, which develops roughness over 
time [Ml [23 [28] • Furthermore, the values of the scaling exponents of the fitness variance suggest that 
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evolution of spatial populations belongs to a class of surface growth models called the Kardar-Parisi- 
Zhang (KPZ) universality class [Ml 1301 Ell 131] ■ By exploiting the equivalence to models of surface 
growth, this scenario can be described in great detail, including in particular the non-Gaussian shape 
of the fitness distribution. 

2 Model 

The spatial constraints are realized as a one dimensional lattice of size L with periodic boundary con- 
ditions, where each point represents a single organism that occupies a space [24]. The evolution follows 
standard Wright-Fisher dynamics in discrete generations, where the next fitness of each site is chosen 
randomly from one of the parents in the neighborhood, weighted according to their fitness. The smallest 
possible neighborhood in one dimension is such that the child in the next generation inherits the fitness 
from only two possible parents, that is, the fitness fi{t + 1) of site i at generation t + 1 is chosen from 
either f,{t) or fi+i{t). 

In the case of a homogeneous system of fitness 1, where a single mutant appears with fitness 1 -I- s, 
the fixation probability for a beneficial mutation is the same as in the well- mixed case, tt = 2s [181 119) . 
Intuitively, the fixation probability is unaffected because a single mutation has ample time to compete 
with the entire system, regardless of spatial structure. Since the fixation probability is the same, the 
speed of evolution in the periodic selection regime is the same as in the well-mixed case. What is different 
is the timescale of fixation. 

The boundary between two domains with different fitnesses is a biased random walker, and the speed 
of this walker is the expected value of its displacement after one time step, c = s/2 for small s. In the 
continuum limit, this model corresponds to a special case of the more general stochastic Fisher equation 
(or SFKPP equation) ^31 1311 [33 , where it is possible to have traveling waves with speed c ~ s in the 
strong noise regime, or c ^ ^/s in the weak noise regime. However, the dependence of the wave speed 
on s does not change the essential features. 

Importantly, the time for fixation may be much longer in the presence of a spatial structure compared 
to well- mixed populations. A wave spreading with finite speed c will take time ~ L/c to cover 
the whole system (and total population size N ^ L), as opposed to a well-mixed population where 
^fix ~ log(iV). The slow spread of mutations make it more likely that many clones exist simultaneously 
in large systems. A site may also contain more than one organism, in which case c is different, but it 
does not change the overall results |25J (unless interference happens within one site). 

Since we are interested in the rate of evolution during competition, a steady rate of beneficial mu- 
tations is supplied, akin to a population adapting to a new environment. Beneficial mutations appear 
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randomly at rate U per site per generation (harmful mutations arc unlikely to survive and are neglected) . 
We assume that mutations have independent effects, with no epistasis, and therefore increase the fitness 
by log /' — log f + s, where s is a constant on the order of 1%. 

When the time between mutations to appear and become established, t^iut — {t^UL)^^, is much 
longer than tgxi the rate of average fitness increase is mutation limited; V — snUL = 2s^UL. However, 
when tnlut ~ ^flxi multiple unfixed mutations in the population compete with each other, slowing down 
V. In well mixed populations the condition for mutation limited adaptation is that there should be less 
than one new beneficial mutation per generation. In contrast, with spatial structure imut ^ ^fix defines 
a characteristic interference length scale Lc ^ {c/Uy^^, above which mutation competition sets in. In 
this competitive regime, the rate of evolution no longer depends on the supply of beneficial mutations, 
but V becomes independent of L for L > Lc [SUES]- Using this observation and dimensional analysis, 
one may deduce that this maximum speed grows as U^''^ in one dimension, and C/^/"^ in two dimensions. 

In the following we describe the fitness distribution in the transient regime of the evolution, before 
reaching the steady state, by exploiting an analogy to surface growth physics. 

3 Results 

The rough spatial profile of the fitness resembles a typical surface seen in surface growth models [HJ [57] . 
In surface growth, particles are deposited on an initially smooth surface randomly, and they may diffuse 
or stick to each other, gradually forming a rough surface. Many simple models of surface growth were 
studied by statistical physicists interested in non-equilibrium systems [571 HH] • They discovered that 
a large number of models share the same properties in the continuum, long-time limit, where many of 
the microscopic details of the model do not matter, and these classes of models, or universality classes, 
share the same symmetries. 

The evolutionary model defined here is equivalent to a surface growth model called polynuclear 
growth [3S1[371[3H] (PNG), in the limit s — > oo. In PNG, the process of surface growth may be divided 
into two parts, nucleation (mutation), and spreading (selection). Nucleation occurs with low probability 
at any point, at a certain rate, U, which corresponds to adding a small block of height to the surface 
(log fitness). The nucleated block then grows laterally forming a new layer. Depending on the size of the 
lattice, the surface grows layer by layer (corresponding to the periodic selection regime) or the surface 
roughens due to multiple simultaneous nucleation events (corresponding to clonal interference) [361 137). 
In the rough regime the PNG model belongs to the universality class of growth processes described on 
large length and time scales by the KPZ equation, a nonlinear stochastic partial differential equation 
[291 [38]. 
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Figure 1: (a) Variance of the log fitness distribution as a function of time for different system sizes, L ~ 2^'^ 
(green circles), L — 2^^ (blue squares), and L — 2^^ (red crosses), with s = 0.05 and V — 10^^. After a transient 
regime, saturates at a value that depends on L. (b) When the data is rescaled as jL and tjL^I'^ it collapses 
onto a single curve, indicating that in fact a{i) ~ t^l'^ and o(t — >■ oo) ~ L^/^, which are the scaling exponents 
predicted by KPZ theory. 

While in PNG the spreading is fast and deterministic, in the evolutionary model it is stochastic, 
and the new layer may even disappear. The boundaries may collide with each other, and they either 
annihilate or stack up creating differences in log fitness greater than s. From the point of view of surface 
growth it is natural to hypothesize that the universal features of the PNG model are robust with respect 
to these differences, but this has to be verified by explicit simulations. The test of the universality 
hypothesis proceeds in two steps. First, one estimates the scaling exponents governing the power law 
dependence of the standard deviation of the surface height (or log fitness) distribution on time and 
system size. Second, the shape of the full distribution of height fluctuations is considered. 

In surface growth, starting from flat initial conditions, the standard deviation of the surface height 
distribution grows in time as a(t) ^ , where f3 is the growth exponent, then reaches a steady state 
when the correlation length reaches the size of the system [27l l39| . In the steady state, (T{t — > oo) ^ L" 
where a is the saturation exponent. Figure [T^ confirms this scenario for the evolution model. The 
crossover time is where saturation sets in (the elbow), and it scales as L°'^^. One may try to measure 
the exponents from the simulations, but based on the similarity to the PNG model one expects that the 
scaling exponents are those of the one-dimensional KPZ-equation, a = 1/2, f3 = 1/3 and a//3 = 3/2. 
FigurejljD shows that the data indeed collapses when plotted as a^/L versus t/L^^^. In the evolutionary 
context the saturation time scale ~ L"^/^ is proportional to the fixation time of beneficial mutations [25j . 
Note that these values of the exponents characterize the asymptotic, long time and large scale behavior 
of the model, and the behavior in the pre-asymptotic regime may be somewhat different |24) . 
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Over the past decade, a much more refined characterization of the KPZ universahty class has been 
developed that extends beyond the values of the scaling exponents a and /3 to the full probability 
distribution of surface height fluctuations [301 Ell [31] • The essence of this refined universality hypothesis 
is that the log fitnesses (or surface heights) can be written as 

\ogMt) = vt + {ny/\, (1) 

where x is a random variable from one of the Tracy-Widom (TW) distributions, V is the long-time 
growth rate, and F is a constant related to the parameters of the KPZ equation [301110]. From eq. ([!]) 
we find the width of the distribution: 

cj^ = var(log /,) = (Fi)2/3var(x). (2) 

The TW distributions were first discovered in fluctuations around the largest eigenvalues of random 
matrices [H]. The relation to the PNG model was established by mapping the PNG surface height 
to the length of the longest increasing subsequence of random permutations [HI 35] , and subsequently 
TW universality was derived directly from the KPZ equation [351133]. Remarkably, the distributions 
were found to be geometry dependent, with the flat (monomorphic) initial condition leading to the TW 
distribution characteristic of random matrices from the Gaussian orthogonal ensemble (GOE). 

Here we show numerically that, despite the additional randomness of the stochastic spreading, the 
distribution of fltnesses in the non-stationary regime of the spatial evolution model is a TW distribution 
characteristic of the KPZ universality class. One signature of the TW distributions can be seen by 
measuring higher moments, such as skewness, f^i ^ ^ and excess kurtosis, ^^iHli-zilHsil^ ) ^ 3, 

which do not depend on the parameters V and F. Figure [2] shows that the skewness and kurtosis of the 
fitness distributions are non-zero, indicating non-Gaussianity, and they approach the known values of 
the GOE TW distribution. 

It is also possible to compare the fitness distribution directly to the TW distribution. The parameters 
V and F can be found from the simulation data by applying linear regression to the means of equations 
([T]) and ([2]). The fitnesses from the simulation are then rescaled as 

Figure [3] shows that in the non-stationary regime, the fitnesses fall onto the universal GOE TW dis- 
tribution, which is skewed towards higher fitnesses, with tail behaviors — lnP(x)x->-oo ^ X^^^ ^^'^ 
— lnP(x)x-s--oo ^ IxP- To demonstrate the robustness of this result, we simulated a variant of the 
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Figure 2: Skewness and kurtosis of the fitness distributions from 200 simulations and known values for the GOE 
Tracy-Widom distribution. L = 2", s = 0.05, U = IQ-^. 

model where the selective advantage of beneficial mutations, s, is a random variable generated from an 
exponential distribution, a common choice in this field [101 [El [13] . The two data sets can be seen to be 
indistinguishable. 

In addition, two other initial conditions were simulated. The droplet geometry in the PNG model 
is when the initial condition is a single nucleation site, with no additional nucleations (or mutations) 
allowed outside. The boundary of the initial seed grows over time, making the fitness profile curved. The 
deviations from this curved profile converge to the TW distribution of the Gaussian unitary ensemble 
(GUE) [30l[3Tl|42]. The droplet geometry has an interesting evolutionary analogy: It corresponds to a 
mutation that raises the mutation rate significantly (a mutator strain), and competes with a population 
that has essentially no mutations. 

The third initial condition corresponds to a system with fully developed, stationary diversity (surface 
roughness) . In this case the distribution of the deviations from the initial fitness profile is predicted to 
converge to a universal distribution Fq, which does not appear directly in random matrix theory but is 
closely related to the TW distributions [42J. Again, the data fall nicely onto the predicted distribution. 

4 Discussion 

The concept of effective population size has long been useful in population genetics in many contexts, as 
a quantity that may be inferred from an idealized model. When considering the effective population size 
with spatial structure one is faced with two natural choices: the total population size, and a population 
size per length (or area). Our results indicate that the right answer depends on the situation. If the 
system is small enough, individuals have time to compete with everyone, and the system is effectively 
well-mixed. Above the interference length Lf,, the rate of evolution does not depend on the total 
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Figure 3: Scaled fitness distributions for tliree different initial conditions: Flat (blue squares and circles), droplet 
(orange crosses), and rough (green diamonds). Lines indicate the Tracy- Widom GOE (blue solid), GUE (orange 
dotted), and the Fq (green dashed) distributions respectively (calculated using [45]). The scaled fitness distributions 
were taken from simulations after 10^ generations averaged 200 times, with L = 2^*, U = 10^^ and s = 0.05, except 
the blue squares which had exponentially distributed selection coefficients, with mean (s) — 0.05. For rough initial 
conditions, the simulation was first run to the steady state (i'^/^ generations), and deviations from the initial 
condition were calculated. For the droplet geometry, a single mutation was first allowed to establish, and mutations 
were only allowed in that lineage. The exact shape of the droplet is unknown, so only fitnesses from the position of 
the initial mutation (the peak of the droplet) were used in the distribution. 
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population, and it is appropriate to consider only a neighborhood population: Lc times the density. 
While it is still possible to write that V ^ UNcs with A'off ~ Lc, this relation is not very informative, 
because the effective population size depends on U itself such that in the end V ^ U^l"^ . 

Fisher's fundamental theorem states that the speed of evolution is equal to the variance of the fitness 
distribution In the spatial model, there is a speed limit for large system sizes, while the variance 
grows linearly with L, and V ^ . It may seem as though Fisher's fundamental theorem is violated. 
However, in this case it makes sense to consider the local population rather than the total population. 
The variance of the fitness distribution for the local neighborhood of size does not change with L, 
and Fisher's theorem still holds in this sense. 

The model presented here has the scaling exponents and universal distribution that belong to the 
KPZ universality class. This is perhaps not so surprising given the similarity to the PNG model, which 
belongs to this class. Universality implies that the model is robust, because many of the details, such 
as the wave speed and the distribution of selection coefficients, do not change the scaling behavior and 
the fitness distribution. 

Knowing the universality class has implications for generalizations of the model. For example, based 
on our understanding of KPZ-type surface growth processes, it is expected that the saturation of the 
speed holds in any habitat dimension and for a broad class of distributions of selection coefficients, 
including those that are fatter than exponential. A recent simulation study has investigated a range of 
KPZ-models in two-dimensional (planar) habitats and identified a set of geometry-dependent universal 
distributions that are qualitatively similar to those found in the one-dimensional case |46| . 

Spatial evolution models in planar habitats have been considered in the context of cancer progression, 
where the distribution of waiting times until the occurrence of a given number fc of mutations is of 
central interest |17| . In the surface growth analogy, this corresponds to the time when the surface reaches 
a given height. Using the probabilistic concept of first passage percolation, it can be shown that such 
waiting times in KPZ-type growth processes again follow KPZ statistics [5511301 ■ This implies that the 
distribution of 'waiting times to cancer', which was argued in [17] to be Gaussian for small /c, should 
asymptotically approach the two-dimensional analogue of the TW distribution found in [46j . 
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